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(57) ABSTRACT 

Multimedia information retrieval is performed using meta- 
descriptors in addition to descriptors. A "descriptor" is a 
representation of a feature, a "feature" being a distinctive 
characteristic of multimedia information, while a "meta- 
descriptor" is information about the descriptor. Meta- 
descriptors are generated for multimedia information in a 
repository (10, 12, 14, 16, 18, 20, 22, 24) by extracting the 
descriptors from the multimedia information (111), cluster- 
ing the multimedia information based on the descriptors 
(112), assigning meta-descriptors to each cluster (113), and 
attaching the meta-descriptors to the multimedia informa- 
tion in the repository (114). The multimedia repository is 
queried by formulating a query using query-by-example 
(131), acquiring the descriptor/s and meta-descriptor/s for a 
repository multimedia item (132), generating a query 
descriptor/s if none of the same type has been previously 
generated (133, 134), comparing the descriptors of the 
repository multimedia item and the query multimedia item 
(135), and ranking and displaying the results (136, 137). 

6 Claims, 2 Drawing Sheets 
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USING META-DESCRIPTORS TO 
REPRESENT MULTIMEDIA INFORMATION 

BACKGROUND OF THE INVENTION 

The present invention relates to content -based processing 
of multimedia data, and more particularly to creation and use 
of attributes of multimedia data that are descriptive of the 
content thereof. 

Multimedia information typically exists in various inho- 
mogeneous forms, including, for example, digital, analogue 
(e.g., VCR magnetic tape and audio magnetic tape), optical 
(e.g., conventional film), image (e.g., pictures and drawings 
on paper), and so forth. The ability to locate this multimedia 
information is important in modern society, and is particu- 
larly important in various professional and consumers appli- 
cations such as, for example, education, journalism (e.g., 
searching speeches of a certain politician using his name, his 
voice or his face), tourist information, cultural services (e.g., 
history museums, art galleries, and so forth), entertainment 
(e.g., searching for a game or for karaoke titles), investiga- 
tion services (e.g., human characteristics recognition and 
forensics), geographical information systems, remote sens- 
ing (e.g., cartography, ecology, natural resources 
management, and so forth), surveillance (e.g., traffic control, 
surface transportation, non-destructive testing in hostile 
environments, and so forth), biomedical applications, shop- 
ping (e.g., searching for clothes that you like), architecture, 
real estate, interior design, social (e.g., dating services), and 
film, video and radio archives. Unfortunately, present sys- 
tems are not thorough, quick or efficient in searching mul- 
timedia information; see, e.g., International Organisation for 
Standardisation ISO/IEC JTC1/SC29/WG11 Coding of 
Moving Pictures and Audio, MEPG-7 Applications Docu- 
ment V.8, No. N2728, March 1999, which is hereby incor- 
porated herein by reference in its entirety. 

An important step in support of searching multimedia 
information is to represent it in a form that is searchable 
using modem computer systems. Much interest has been 
expressed in developing forms of audio -visual information 
representation that go beyond the simple waveform or 
sample-based representations, the compression-based repre- 
sentations such as MPEG-1 and MPEG-2, and the object- 
based representations such as MPEG-4, and that can be 
passed onto, or accessed by, a device or a computer code. 
Numerous proprietary solutions have been developed for 
describing multimedia content and for extracting the repre- 
sentations and querying the resulting collections of 
representations, but these have only proliferated yet more 
heterogeneous multimedia information and exacerbated the 
difficulties of conducting quick and efficient searches of 
multimedia information. 

A "descriptor** is a representation of a feature, a "feature" 
being a distinctive characteristic of multimedia information 
regardless of the media or technology of the multimedia 
information and regardless of how the multimedia informa- 
tion is stored, coded, displayed, and transmitted. Since 
descriptors used in different proprietary multimedia infor- 
mation retrieval systems are not necessarily compatible, 
interest has been expressed in creating a standard for 
describing multimedia content data that will support the 
operational requirements of computational systems that 
create, exchange, retrieve, and/or reuse multimedia infor- 
mation. Examples include computational systems designed 
for image understanding (e.g., surveillance, intelligent 
vision, smart cameras), media conversion (e.g., speech to 
text, picture to speech, speech to picture), and information 



,724 Bl 

' 2 

retrieval (quickly and efficiently searching for various types 
of multimedia documents of interest to the user) and filtering 
(to receive only those multimedia data items which satisfy 
the user's preferences) in a stream of audio-visual content 
description. 

Accordingly, a need exists for a standard for describing 
multimedia content data that will support these operational 
requirements as well as other operational requirements yet to 
be developed. 

) 

SUMMARY OF THE INVENTION 

Accordingly, an object of the present invention as realized 
in particular embodiments is to improve the efficiency of 
5 retrieval of multimedia information from a repository. 
Another object of the present invention as realized in 
particular embodiments is to improve the speed of retrieval 
of multimedia information from a repository. 
Yet another object of the present invention as realized in 
) particular embodiments is to provide a standard representa- 
tion of a feature of multimedia information. 

These and other objects are achieved in the various 
embodiments of the present invention. For example, one 
embodiment of the present invention is a method of repre- 
5 senting a plurality of multimedia information, comprising 
acquiring descriptors for the multimedia information, gen- 
erating at least one meta-descriptor for the descriptors, and 
attaching the at least one meta-descriptor to the multimedia 
information. 

3 Another embodiment of the present invention is a method 
of representing a plurality of multimedia information which 
collectively is of various content types, comprising acquir- 
ing descriptors for the multimedia information, generating 
clusters of the descriptors, generating m eta- descriptors for 
the clusters, and respectively attaching the me ta -descriptors 
for the clusters to items of the multimedia information 
described by the descriptors in the clusters. 
A farther embodiment of the present invention is a 

0 method of searching multimedia information in a repository 
described by descriptors using a query multimedia informa- 
tion item, comprising acquiring meta-descriptors of the 
repository descriptors, selecting query multimedia 
information, extracting at least one query descriptor from the 

5 query multimedia information based on the meta-descriptors 
to obtain at least one query descriptor, comparing the query 
descriptor with the repository descriptors, and ranking at 
least some of the multimedia information in the repository in 
accordance with the comparing step. 

o Another embodiment of the present invention is a method 
of retrieving multimedia information from a repository, 
comprising extracting repository descriptors from the mul- 
timedia information in the repository, generating clusters of 
the repository descriptors, indexing the repository descrip- 

5 tors to the multimedia information in the repository, gener- 
ating meta-descriptors for the clusters, attaching the meta- 
descriptors for the clusters to the respective multimedia 
information in the clusters, selecting query multimedia 
information, extracting at least one descriptor from the query 

o multimedia information based on the meta-descriptors to 
obtain at least one query descriptor, comparing the query 
descriptor with the repository descriptors, and ranking at 
least some of the multimedia information in the repository in 
accordance with the comparing step. 

5 A further embodiment of the present invention is a data 
structure for representing information about a plurality of 
descriptors that are representations of features of an item of 
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multimedia information belonging to a particular category of 
multimedia content, comprising a plurality of data elements 
indicating relevancy of the descriptors in describing the item 
of multimedia information. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a pictorial representation of various forms of 
multimedia data in a repository. 

FIG. 2 is a flowchart of a meta-descriptor generation 
process and a multimedia query process, in accordance with 
the present invention. 

FIG. 3 is a table of records for an illustrative relational 
database, in accordance with the present invention. 

FIG. 4 is a flowchart of a process for refining meta- 
descriptors for multimedia in a repository, in accordance 
with the present invention. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENT 

Examples of the various forms in which multimedia data 
may exist are shown in FIG. 1, and include multiple occur- 
rences of, for example, graphics 10, still images 12, video 
14, film 16, music 18, speech 20, sounds 22, and other media 
24. These multiple occurrences may be stored differently, 
coded differently, transmitted differently, exist on different 
media, or have been produced with different technologies. 
The multimedia data may be stored in one place or distrib- 
uted throughout the world; for example, digitized multime- 
dia of interest to a user may be stored in a self-contained 
relational or object-oriented data base, or in separate inde- 
pendent data bases implemented in different technologies 
and stored on different proprietary computers scattered 
throughout the world and accessible only over the Internet. 
Indeed, non-digital multimedia of interest to a user similarly 
may be stored in one collection under the control of a single 
entity, or widely scattered in different collections under the 
control of different entities. Regardless of the form in which 
the multimedia information exists and how it is stored, the 
user would prefer to view the collection of multimedia 
information as a single repository, as shown by the reference 
numeral 1, for purposes of efficiently searching for specific 
multimedia data. 

We have found that multimedia information retrieval that 
uses meta-descriptors in addition to descriptors is not only 
efficient in identifying multimedia information but is also 
able to identify multimedia information that has been rep- 
resented in a variety of different ways. A "descriptor" is a 
representation of a feature, a "feature" being a distinctive 
characteristic of multimedia information, while a "meta- 
descriptor" is information about the descriptor. A meta- 
descriptor is different from but related to the general concept 
of meta-data, which is a well known way of embedding 
additional information. For example, meta-data in docu- 
ments may include format of the images in the document, 
and meta-data in a database may include value constraints or 
statistical information for an attribute in a relation. 
Specifically, a meta-descriptor for an item of multimedia 
information identifies those parts of a descriptor for that item 
of multimedia information that contain the most useful 
information for identifying that item of multimedia infor- 
mation. The concept of meta-descriptor is based on the 
premise that a given multimedia information item is best 
qualified to know what describes it best, and indicating this 
information greatly enhances content based retrieval. 
Advantageously, meta-descriptors enable computerized 
searches for multimedia information to be done more 
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quickly due to the generally smaller size of meta-descriptors, 
as well as more efficiently due to the elimination of less 
relevant information. 
Although many multimedia retrieval techniques are 

5 adaptable to the use of meta-descriptors, a preferred tech- 
nique for searching multimedia information using meta- 
descriptors is the scenario search, or query-by-example. In a 
query-by-example search of, illustratively, images, a par- 
ticular still image is specified as the basis for querying the 

10 repository. The query is specified either by the initiator of the 
query, which may be a human user or an automated process, 
for example, or by retrieval algorithms used in the retrieval 
process. The query is formed based on a feature or features 
indicated in the meta-descriptors of the multimedia infor- 

15 mation in the repository to be important. Descriptors for the 
feature or features are extracted from the query multimedia 
information and compared with descriptors extracted from 
the repository multimedia information to obtain similarity 
measures, which are used to select one or more "matching" 

20 items of multimedia information. In some instances, meta- 
descriptors for the repository multimedia information may 
immediately indicate an obvious and large dissimilarity in 
content, thereby obviating the computations to perform 
extraction of descriptors and comparison of features for the 

25 query and the particular repository multimedia information. 
If the user is not an information retrieval expert, preferably 
the particular feature or features used in the retrieval of 
matching multimedia information are transparent to the user 
for simplicity of use. 

30 FIG. 2 is a flowchart showing processes for generating 
meta-descriptors for repository multimedia information and 
for performing a query of the repository. Although meta- 
descriptors may be used for any type of multimedia 
information, the example illustrated in FIG. 2 is based on 

35 still images to facilitate the description. A method 110 for 
generating meta-descriptors is illustrated by the principal 
steps 111-114, and a method 130 for querying a multimedia 
repository such as shown in FIG. 1 is illustrated by the 
principal steps 131-137. The meta-descriptor generation 

40 method 110 is an unsupervised or automated method of 
machine learning, although meta-descriptors may also be 
generated by formalizing user input by a human or by a 
hybrid of semi-automatic techniques. The query method 130 
preferably is automated except for the user's formulation of 

45 a query. Various aspects of descriptor generation and mul- 
timedia information retrieval are well known and are 
described in various publications, including, for example, 
Yong Rui, Thomas S. Huang, and Shih-Fu Chang, Image 
Retrieval: Past, Present, and Future, Journal of Visual Com- 

50 munication and Image Representation, 10, 1-23 (1999); 
Sharad Mehrotra, Yong Rui, Michael Ortega-Binderberger, 
and Thomas S. Huang, Supporting Content-based Queries 
over Images in MARS, Proceedings of the IEEE Interna- 
tional Conference on Multimedia Computing and Systems, 

55 Jun. 3-6, 1997, Chateau Laurier, Ottawa, Ontario, Canada, 
1997, pp. 632-633; Sharad Mehrotra, Yong Rui, Kaushik 
Chakrabarti, Michael Ortega-Binderberger, and Thomas S. 
Huang, Multimedia Analysis and Retrieval System, Pro- 
ceedings of the 3 rd International Workshop on Information 

eo Retrieval Systems, Como, Italy, Sep. 25-27, 1997, pp. 
39-45; and Patrick M. Kelly, Michael Cannon, and Donald 
R. Hush, Query by Image Example: the CANDID Approach, 
in SPIE Vol. 2420 Storage and Retrieval for Image and 
Video Databases III, 1995, pp. 238-248, which are incor- 

65 porated herein by reference in their entirety. 

The first step 111 in the meta-descriptor generating 
method 110 is extraction of descriptors from multimedia 
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("MM") information. Descriptor extraction algorithms and d-dimensional metric space, the proximity can be distance 

software as well as indexing algorithms and software are between pairs of points, such as Euclidean distance, 

well known in the art., and examples are described in the Typically, the proximity matrix is the one and only input to 

aforementioned publications. Except for constraints a clustering algorithm. The objects being clustered could be 

imposed by any relevant standards, any one or combination 5 different species of plants, pixels in a digital image or 

of descriptor extraction techniques may be used as desired. documents on different topics. Cluster analysis finds several 

Typically, descriptors extracted from still images, for applications in pattern recognition and image processing, 

example, include color, shape, texture and sketch, although Cluster analysis is used for feature selection, in applications 

the specific descriptor extracted in any one instance depends involving unsupervised learning, and in aiding machine 

on the software used in an automated process, the actions of 10 learning and knowledge representation in artificial intelli- 

the user in a manual process, or the interactions of the user gence. 

and the software in a hybrid process. Automated processes Descriptors extracted from still images typically are vec- 
are advantageous since they can process large amounts of t0fS of multidimensional numbers representing a collection 
multimedia information in the repository without requiring of points jn fa ^ simple case> the chisterillg alg0 . 
significant human intervention. Typically, automated pro- 35 rithm compares d^ces in a collection of points in 2D 
cesses that operate on still images do not operate on a gpace to delermine how close the points are . In more 
semantic level, which is to say that they do not describe a advarjced algorithms, the concept is extended to multidi- 
family portrait in such terms, but rather produce values of mensional Xo ensure consistency, the clustering pro- 
color, shape, texture and sketch for perhaps the entire still ^ ^ Hed ferabl to only multimed ia information 
image, or perhaps for multiple blocks into which the image 20 processed with the ^ extraction algorithm. For still 
has been divided. images, clustering is based on similarity of typically low 
The next step 112 is to cluster the multimedia information level features; for example, certain images from which clear 
based on the descriptors, although other techniques may be patterns emerge for a particular color and texture (for 
used if desired and examples are described in the aforemen- example, such as would be generated from a beach scene) 
tioned publications. Essentially, clustering is grouping of 2 5 but with otherwise indefinite results for sketch and shape 
similar multimedia information from a large mixed data may be considered to be in the same cluster. Unsupervised 
set— clustering is not needed for small repositories of same- clustering algorithms typically work iteratively, refining 
content type information— based on certain criteria applied tneir results until a threshold point specified by the user is 
to the descriptors. A cluster is a set of entities which are achieved. 

alike, and entities from different clusters are not alike. 30 n * * hi • . • * j * i_ 

r . « . . . The next step 113 is to assign meta-descnptors to each 

Extraction of meta-descnp tors may be done by supervised or , 4 ~ * . , * 1 r 1 

. , , 4 . 4 t . c * j • * u cluster. For example, where a pattern emerges only for color, 

unsupervised clustering. Extraction of meta-descnptors by . , • * * 1 1 ™m ■ • j * lu 1 * 

. , , . r . 1.. i.j- a meta-descnp tor or color 100% is assigned to the cluster, 

supervised clustering mvolves clustering multimedia con- * , c , . T , • , u t 

r , , . . & . ° . Where a pattern emerges only for sketch, which would be 

tent based on its features, given a set or cluster representa- . , c 7 J j r 

, , . y & . • , 1 i_ 1 , * expected for mono -chromatic engineenng drawings, for 

tives that have been previously assigned a label or a descnp- 35 , , , . , c 1 < tinrw • • , u 

™ . . ,1 .7 • j.Liu. example, a meta-descnp tor of sketch 100% is assigned to the 

tor. The images in each cluster are then assigned the label or , . r r , , . f . - 

. , . 1 6 r . A . , , . . • * e cluster. For the particular cluster of still images of the 

the descriptor of that cluster s representative. Extraction of . " ,. , , ^ 1 * 

j . 1 . j .. . . * , previous example m which clear patterns emerge only for a 

meta^escnptors by unsupervised clustering involves clus- Articular and ( ^a-descriptor of color 50% 

termg multimedia content based on each descnbed feature. v 5Q% ed ^ ^ 

For example, a set of images have a first cluster represen- 40 4 j -j • L * ^ • j • u 

. . * . . . ; 4 j 1 * automation is desired, weights may be assigned using heu- 

tation based on their color features and have a second cluster . , . - . . ^ . . . % e 

, . • , r ^.jr nstic rules, which are based on statistical information from 

representation based on their texture features. Based, for , . n- j- ■ r 

r 1 A , . r.t. i > r a.c \ past experiences with multimedia information. 

example, on the comparison of the clusters for each feature r AU ,. r . . , . . , , „ ., 

. r ' j * ■ * 1 1 * l Alternatively, meta-descnptors may be assigned manually, if 

using programmed metncs to calculate how clearly denned • • » *• 1. * 

, , A r L r r 4 desired, or in a semi-automatic way with human interaction, 

and compact the clusters are, one feature or a lew reatures 45 ^ de sired 

are found to outperform others in describing a given image. 

For example, a certain image may belong to a very compact Met a- descriptors may take whatever form is convenient 

and clearly defined cluster in the set of clusters based on for the Programmer. In one particularly compact form, the 

color features, but may belong to a cluster with a wide meta-descnp tor is a binary vector X, each bit x, indicating 

spread and overlap in the set of clusters based on texture 50 the relevance (V*' feature * relevant) of a feature given a 

features. The color feature accordingly is chosen as the fixed number of ordered features for that category of mul- 

metandescriptor for the image, since it classifies the image timedia content - Iri the case of a stlU iraa S e > for example, a 

better than the texture feature in their respective feature suitable vector * a four bit vector in which a binar y 1 or 0 

spaces indicates the importance or inelevancy, respectively, of 

Mathematically, a cluster is an aggregation of points in the 55 co ] or ' sha P c ' and sketcb ™ describin S ^ multimedia 

test space such that the distance between any two points in information. If only color is important in a particular still 

the cluster is less than the distance between any point in the ima & e > a sultable mcta-descnptor is 1000. 

cluster and any point not in it. See, e.g., Anil K. Jain and A notation that is able to assign specific weights uses a 

Richard C. Dubes, Algorithms for Clustering Data, Prentice weighted vector X, each element of the vector x,- indicating 

Hall Advanced Reference Series, 1988, p. 1. Cluster analysis 60 ^e weight assigned to the i th feature given a fixed number 

is the process of classifying objects into subsets that have of ordered features for that category of multimedia content, 

meaning in the context of a particular problem. The objects ^ say color and sketch are both important but have different 

are thereby organized into an efficient representation that weights, a suitable meta -descriptor of this type is "70,0,0, 

characterizes the population being sampled. The relationship 30 " indicating that color has a 70% weight and sketch has a 

between objects is represented in a proximity matrix in 65 ^0% weight. 

which rows and columns correspond to objects. If the Another form is string notation, which is capable of 

objects are characterized as patterns, or points in handling not only different weights but also different extrac- 
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tion algorithms and sectioning of the still image into mul- information to determine the values in the meta-descriptor. 
tiple blocks. Moreover, the string representation allows for For example, where the multimedia content is an image, 
new features to be considered in the meta-descriptor later in features from the image are combined with semantic infor- 
time. In string notation, each character or group of charac- mation from the text caption associated with the image to 
ters in the string indicates the relevance of a feature given a 5 determine the values in the meta-descriptor for the image, 
set of features in accordance with a predefined standard FIG. 3 is an example of a simple multimedia information 
notation. Consider, for example, a set of four valid color table for a relational data base file (any other type of 
descriptor types: (1) a single histogram for an entire image database is suitable as well) that uses various attachment 
in RGB color space; (2) twenty-five histograms for the techniques. Illustratively, the table has five fields, a multi- 
image in RGB color space that is divided into a 5x5 grid, media information number field is MM _JNFO_NO, a 
each of the resulting twenty-five blocks being represented by descriptor value field DV, a meta-descriptor value field 
a histogram; (3) a single histogram for the entire image in MDV, a multimedia file source field MM_SOURCE, and a 
YUV color space; and (4) twenty-five histograms for the comment field COMMENT. The MM_INFO_J^O field is a 
image in YUV color space that is divided into a 5x5 grid, primary key field. The DV and MDV fields are character 
each of the resulting twenty-five blocks being represented by fields for containing, for example, string vectors. The 
a histogram. Assume that these descriptor types are numeri- MM__SOURCE field is an OLE data type that links to or 
cally ordered from 1 to «n," n being the number of valid embeds °^ objects such as digMzed document^ oxawings, 
descriptor types, here four. A suitable string meta-descriptor P lctures > a fl nc L S0 T ? , ? h \ T1 f ^T^/ V 
for a still image that is best described by, for example, the ^^mo data type field If desired the DV field may be 
n * j r t j * „ • arv>\ a» • *u omitted from the table provided descriptors either are 
first and fourth color descriptors is C214 meaning: the , , . 4 , n- j- • r *u * • 1 

, . . 1 ... embedded in the multimedia information or the retrieval 

co or feature is relevant (C) and is obtained with two (2) g ^ extracted descri tors &om ^ multimedia ^ 

color representations from a pre-defined set of color How6V6r> havi dcscriptors in a ]ocal databas6 ^ 

representations, namely the first and fourth (14) color rep- aUow ^ retrieva i system t0 operat e more quickly, 

mentations from the pre-defined set of color representa- ^ record iien ^ d b ^e ^ k MM01 contains 

lions String notation is particularly flexible allowing not ^ the descri tor value DVO i and the meta-descriptor value 

only different color spaces (for example, RGB and YUV) to MDV01 and fa aUached to a di i(ized s , m { stored ^ 

be identified but also allowing each color space to be F]LE01 b a ^ jn {he QLE data Md ^ wconJ 

calculated differently (for example, as one block, a set of ten identified by the primary key MM02 contains the meta- 

blocks, a set of 100 blocks, and so forth). Extensions of descri tor vahle MDV 02, and is attached to a digitized still 

string notation can also handle different extraction algo- 30 i mage stored in FILE02 by a link in the OLE data type field, 

rithms by appropriate predefined codes. ^ descriptor va]ue 

is extracted from the content of FILE02 

Meta-descriptors of different forms may be used for during the process of quer yi ng the multimedia information 

different multimedia information, and any information not m me rep0 sitory. T^he record identified by the primary key 

provided for in the particular form of meta-descriptor can be MM03 cont ai D s the descriptor value DV03 and the meta- 

furnished by default. For example, if the default descriptor 3S descriptor value MDV03 for a block of multimedia infor- 

extraction method and the default color space are used, a mation in an aod ^ attached to a digitized still image 

binary meta-descnptor is adequate. stored in FILE 03 by a link in the OLE data type field. The 

Jhe next step 114 is to attach meta-descriptors to mul ti- record identified by the primary key MM04 contains the 

media information based on cluster mlormation. A variety of descriptor value DV04 and the meta-descriptor value 

different "attachment" techniques are well known and may 40 MDV04 for another block of multimedia information in the 

be selected for use based on the media type and manner of same image, and is attached to a digitized still image stored 

accessing it, and different attachment types may be used m FILE03 by a link in the OLE data type field. The record 

within a particular data base of meta-descriptors. The identified by the primary key MM05 is attached to a digi- 

descriptors themselves may or may not be present, although ^d still image stored in FILE05 by a link in the OLE data 

if they ar e not pre sent the system must know how to 45 type field. The DV and MDV fields are null for this record, 

calcula te lliem, eiiliei by default 0 1 by a val ue in the smce the DV and MDV are embedded in the linked file and 

meta-descriptor not ation. Preferably, at least the meta- can easily be read from it. The record identified by the 

descriptors and their attachment dala are Stored in storage primary key MM06 is attached to a digitized still image 

120, which may be any type of data base accessible to the st0 red in another data base accessible over the Internet by a 

system. Descriptors may be stored in the storage 120 or 50 URL link in the OLE data type field. The DV and MDV 

stored with the multimedia inrormation rrom wmcn mey fi e ids are null for this record, since the DV and MDV are 

were extracted. The multimedia information itself resides in embedded in the linked file and can be read from it. The 

a repository (FIG. 1) which may be as specific as other record identified by the primary key MM07 contains the 

memory space in the storage device 120 or as diverse as the meta-descriptor value MDV07, and is attached to a VCR 

internet, or even so diverse as to include non-electronic 55 tape . The DV field is null for this record, since the descriptor 

forms of storage such as paper. value is embedded in the vertical blanking interval on the 

Once a feature is chosen to be present in the meta- VCR tape and can be read from it. The MM_SOURCE field 

descriptor for, say an image, the meta-descriptor may if is null for this record. Unless the retrieval system detects 

desired allow for the presence of other features from a from the meta-descriptor that an obvious and large dissimi- 

pre -defined set. To reduce the number of features to be tested eo larity in content exists between the query multimedia and the 

for, a set of association rules derived from a labeled training VCR tape, the tape must be mounted and the descriptor must 

set may be used, if appropriate. For example, a particular be read from the VCR tape during a query. The record 

repository may contain multimedia information that is not identified by the primary key MM 08 contains the descriptor 

well described by sketch, so that the retrieval system would value DV08 and the meta-descriptor value MDV08, and is 

not need to use sketch. 65 attached to a still image printed on photographic paper and 

During extraction, features from a multimedia content filed in drawer 08. The MM^_SOURCE field is null for this 

may be combined, if desired, with higher level semantic record. 
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The method 130 for querying a multimedia repository 
such as shown in FIG. 1 is illustrated by the principal steps 
131-137. Step 131 is the formation of a query by the user 
using any convenient method, here query-by-example. In 
query-by-example, the user selects an item of multimedia 
information and wishes to find all matching items of mul- 
timedia information from the repository. In step 132, 
descriptors and meta-descriptors for an item of multimedia 
information in the repository is retrieved, illustratively from 
the storage 120. In the case of the descriptors, they may 
instead be stored with the multimedia information and 
therefor may have to be retrieved from the multimedia 
information, or they may be unavailable and have to be 
extracted again based on the values in the meta-descriptors. 
If the descriptor for the repository multimedia information 
item is of a type not previously processed in the query 130 
(step 133 — YES), a corresponding descriptor is extracted 
from the query multimedia item (step 134) by applying the 
extraction method and weights indicated by the meta- 
descriptor for the repository multimedia information item. A 
comparison (step 135) is then made between the query 
descriptor and the descriptor for the repository multimedia 
information item. Features given no weight in the meta- 
descriptor for the repository multimedia information item 
need not be processed for meta -descriptor extraction. The 
comparison is repeated for all clusters in the database (step 
136 — NO), and the set of closest matches from each cluster 
is appropriately ranked, with suitable means being well 
known in the art, and displayed to the user (step 137). 

A technique for optimizing meta-descriptors that involves 
formalizing user input by a human expert is shown in FIG. 
4. For clarity in the description, a simple repository of still 
images is presumed. Such images typically are classifiable 
in just a few categories, for example, human figures, plants, 
landscapes, and textiles, and allow a content-based retrieval 
by a few methods such as color, shape, texture, and sketch. 
A given image in the database is best described by one or 
more of these features and poorly described by other fea- 
tures. For example, a human figure is best described by 
shape, plants are best described by color and texture, and 
landscapes are best described by texture alone. A database 
having these characteristics is trained using human input as 
follows. In step 301, a meta-descriptor generation process is 
performed using all of the features, illustratively color, 
shape, texture, and sketch — steps 112-114 of FIG. 2 are 
illustrative of such a process. In step 302, a multimedia 
query process is performed using all of the features, illus- 
tratively color, shape, texture, and sketch — steps 131-136 of 
FIG. 2 are illustrative of such a process. In step 303, the 
results are ranked by the retrieval system and displayed to 
the human expert. From the closest matches, the user 
determines which method suits the query image best, or if 
more than one method suits the query image, the user 
determines the weights for each of the suitable features 
using a suitable criterion. The user also indicates all of the 
other images in the sets of closest matches that are to be 
given the same weights for the suitable features. In step 306, 
the retrieval system updates the value of the meta- 
descriptors, for example, by assigning new weights for 
features, based on the human expert's input. Any of various 
iterative learning techniques may be used. An image that has 
not been considered at all in the training may be assigned 
equal weights for all the features. Steps 301, 302, 303 and 
306 are repeated until the human expert is satisfied with the 
results, in which event the meta-descriptors are optimized 
and the process 300 ends. 

Meta-descriptors are most effective when incorporated 
into a system of standards for descriptors, descriptor 
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schemes, and descriptor definition languages, although the 
particular standardization scheme is not critical. An illustra- 
tive view of what a descriptor is and how it functions in a 
multimedia information retrieval system is set forth in: 
International Organisation for Standardisation ISO/IEC 
JTC1/SC29/WG11 Coding of Moving Pictures and Audio, 
MEPG-7 Requirements Document V.8, No. N2727, March 
1999; and International Organisation for Standardisation 
ISO/IEC JTC1/SC29/WG11 Coding of Moving Pictures and 
Audio, MEPG-7: Context, Objectives and Technical 
Roadmap, V.ll, No. N2729, March 1999, which are hereby 
incorporated herein by reference in their entirety. According 
to the view expressed in these documents, searching of 
multimedia information is performed by comparing 
"descriptors" and their instantiations ("descriptor values"), a 
descriptor being a representation of a "feature" of the 
multimedia information and a feature being a distinctive 
characteristic of the multimedia information that signifies 
something to somebody. A descriptor defines the syntax and 
the semantics of the feature representation. If desired, sev- 
eral descriptors may be used to represent a single feature, as 
when different relevant requirements need to be addressed. 
For example, possible descriptors for the color feature are: 
the color histogram, the average of the frequency 
components, the motion field, the text of the title, and so 
forth. Descriptor values are combined via the mechanism of 
a "description scheme" to form a "description." In particular, 
a description scheme ("DS") specifies the structure and 
semantics of the relationships between its components, 
which may be both descriptors and description schemes, and 
a description consists of a DS (structure) and the set of 
descriptor values (instantiations) that describe the multime- 
dia data. A description definition language ("DDL") is a 
language that allows the creation of new description 
schemes and, possibly, descriptors. It also allows the exten- 
sion and modification of existing description schemes. Table 
1, which is taken from the aforementioned MPEG-7 
Requirements Document V.8 (modified to include a sketch 
feature), exemplifies the distinction between a feature and its 
descriptors. 

TABLE 1 



Feature 



Descriptor 



Categojy 



50 



55 



60 



Type 


Feature Label 


Data Type 


Spatial 


texture of an object, 


a set of wavelet 




shape of an object 


coefficients, a set of 
polygon vertices 


Temporal 


Trajectory of objects 


chain code 


Objective 


color of an object 


color histogram, 
igb vector, text 




shape of an object 


a set of polygon vertices, 
a set of momenta 




texture of an object 


a set of wavelet 
coefficients, the set of 
contrast, coarseness and 
directionality 




sketch of an object 


set of edges 




audio frequency 


average of frequency 




content 


components 


Subjective 


emotion (happiness, 


a set of eigenfacc 




angry, sadness) 


parameters, text 




Style 


text 




Annotation 


text 


Production 


Author 


text 




Producer 


text 




Director, etc. 


text, etc. 
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TABLE 1-contimied 
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Feature 

Category 
Type 



Pescrjptor 



Feature 



Feature 

Label Data Type 



Composition scene composition 
information 
Concepts Event 
Activity 



tree graph 
text 

text, a numeric value 



General requirements for descriptors and description 
schemes as proposed in the aforementioned MPEG-7 15 
Requirements Document are supported by suitable descrip- 
tors and their meta-descriptors. Multiple types of features — 
suitable descriptors and their meta-descriptors support mul- 
timedia descriptions using various types of features such as: 
N-dimensional Spatio-temporal structure (e.g., the duration 20 
of a music segment), objective features (e.g., the number of 
beds in a hotel, color of an object, shape of an object, audio 
pitch, etc.), subjective features (e.g., how nice, happy or fat 
someone is, topic, style, etc.), production features (e.g., 
information about document creation such as the date of 25 
acquisition, producer, director, performers, roles, production 
company, production history, any non-IPMP production 
information), composition information (e.g., how a scene is 
composed, editing information, the user's preferences, etc.), 
and concepts (e.g., event, activity). Abstraction levels for the 30 
multimedia material — hierarchical mechanism to describe 
multimedia documents at different levels of abstraction are 
supported, which accommodates users' needs for informa- 
tion at differing levels of abstraction such as, for example, 
the composition of objects from sub-objects, a sequence by 35 
sequence analysis of motion in a video, and the plot structure 
of a video. Cross-modality — audio, visual, or other descrip- 
tors and their meta-descriptors that allow queries based on 
visual descriptions to retrieve audio data and vice versa are 
supported (for example, where the query is an excerpt of 40 
Pavarotti's voice and the result is retrieval of video clips 
where Pavarotti is singing and where Pavarotti is present). 
Multiple descriptions — the ability to handle multiple 
descriptions of the same material at several stages of its 
production process is supported, as well as descriptions that 45 
apply to multiple copies of the same material. Description 
scheme relationships — suitable description schemes express 
the relationships between descriptors and their meta- 
descriptors to allow for their use in more than one descrip- 
tion scheme. The capability to encode equivalence relation- 50 
ships between descriptors and their meta-descriptors in 
different description schemes is supported. Descriptor 
priorities — the prioritization of descriptors and their meta- 
descriptors preferably is supported by the description 
schemes so that queries may be processed more efficiently. 55 
The priorities may reflect levels of confidence or reliability. 
Descriptor hierarchy — suitable description schemes support 
the hierarchical representation of different descriptors and 
their meta-descriptors in order that queries may be processed 
more efficiently in successive levels where N level descrip- 60 
tors complement (N-l) level descriptors. Descriptor 
scalability — suitable description schemes support scalable 
descriptors with their meta-descriptors in order that queries 
may be processed more efficiently in successive description 
layers. Description of temporal range — association of 65 
descriptors with their meta-descriptors to different temporal 
ranges are supported, both hierarchically (descriptors with 



their meta-descriptors are associated to the whole data or a 
temporal sub-set of it) as well as sequentially (descriptors 
with their meta-descriptors are successively associated to 
successive time periods). Direct data manipulation — 
descriptors and their meta-descriptors acting as handles 
referring directly to the data are supported, to allow manipu- 
lation of the multimedia material. Language of text-based 
descriptions — suitable descriptors with their meta- 
descriptors specify the language used in the description and 
support all natural languages. Translations in text 
descriptions — suitable text descriptions provide a way to 
contain translations into a number of different languages, in 
order to convey the relation between the description in the 
different languages. 

Functional requirements for descriptors and de scription 
scHefflgs" as proposed in the aforementioned MPE G-7 
RftfjiijreTrien^nncurnent are sup ported by suitable jescri p- 
t ors and their meta-des criptors . Retrieval effectiveness — the 
effective retrieval of multimedia material is supported. 
Retrieval efficiency — the efficient retrieval of multimedia 
material is supported. Similarity-base retrieval — 
descriptions allowing to rank-order database content by the 
de gree of similarity with th e query are su pported. Associated 
information — the association of other information with the 
data is supported. Streamed and stored descriptions — both 
streamed (synchronized with content) and non-streamed 
data descriptions are supported. Distributed multimedia 
databases — the simultaneous and transparent retrieval of 
multimedia data in distributed databases is supported. Ref- 
erencing analogue data — the ability to reference and 
describe multimedia documents in analogue format is sup- 
ported (for example, providing temporal references to 
sequences in a VHS tape). Interactive queries — mechanisms 
to allow interactive queries are supported. Linking — 
mechanisms allowing source data to be located in space and 
in time are supported, including links to related information. 
Prioritization of related information — mechanism allowing 
the prioritization of related information, mentioned under 
Linking above, are supported. Browsing — descriptions 
allowing to pre-view information content in order to aid 
users to overcome their unfamiliarity with the structure 
and/or types of information, or to clarify their undecided 
needs, are supported. Associate relations — relations between 
components of a description are supported. Interactivity 
support — means allowing specifying the interactivity related 
to a description are supported (for example, tele-voting 
related to broadcast events). Intellectual property 
information — inclusion of copyright, licensing and authen- 
tication information related to Ds, DSs and descriptions is 
supported. 

Visual specific requirements for descriptors and descrip- 
tion schemes as proposed in the aforementioned MPEG-7 
Requirements Document are supported by suitable descrip- 
tors and their meta-descriptors. Type of features — visual 
descriptions allowing the following features (mainly related 
to the type of information used in the queries) are supported: 
color, visual objects, texture, sketch, shape, still and moving 
images (e.g., thumbnails), volume, spatial relations, motion, 
deformation, source of visual object and its characteristics 
(e.g., the source object, source event, source attributes, 
events, event attributes, and typical associated scenesO and 
models (e.g., MPEG -4 SNHC). Data visualization using the 
description — a range of multimedia data descriptions with 
increasing capabilities in terms of visualization is supported 
(allows a more or less sketchy visualization of the indexed 
data). V[sji^1 dabi_ formats — description of the follo wing 
visual data formats is supported: digital video and film/such 
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a sMPEG-l,MPEG-2or M PEfi-4 ; analo gue video and film; 
st ill pictures in electronic such as JPEG, paper or oth er 
fo rmat: graphics, such as CAD; 3D models, notably_ VR|yiL; 
an a* composition data associated to vid eo. Description of 
other visual data formats yet to be defined is possible. Visual 
data classes — descriptions specifically applicable to the fol- 
lowing classes of visual data are supported: natural video, 
still pictures, graphics, animation (2-D), three-dimensional 
models, and composition information. 

Audio specific requirements for descriptors and descrip- 
tion schemes as proposed in the aforementioned MPEG-7 
Requirements Document are supported by suitable descrip- 
tors and their meta-de scrip tors. Type of features — audio 
descriptions allowing the following features (mainly related 
to the type of information used in the queries): frequency 
contour (general trend, melodic contour), audio objects, 
timbre, harmony, frequency profile, amplitude envelope, 
temporal structure (including rhythm), textual content 
(typically speech or lyrics) sonic approximations 
(vocalization of a sonic sketch by, for example, humming a 
melody or growling a sound effect), prototypical sound 
(typical query-by-example), spatial structure (applicable to 
multi-channel sources, stereo, 5.1 -channel, and binaural 
sounds each having particular mappings), source of sound 
and its characteristics (e.g., the source object, source event, 
source attributes, events, event attributes, and typical asso- 
ciated scenes), and models (e.g., MPEG-4 SAOL). Data 
sonification using the description — a range of multimedia 
data descriptions with increasing capabilities in terms of 
sonification is supported. Auditory data formats — the 
description of the following types of auditory data are 
supported: digital audio (e.g., MPEG-1 Audio, Compact 
Disc), analogue audio (e.g., vinyl records, magnetic tape 
media), MIDI including General MIDI and Karaoke 
formats, model-based audio (e.g., MPEG-4 's Structured 
Audio Orchestra Language — SAOL), and production data. 
Auditory data classes — descriptions specifically applicable 
to the following sub-classes of auditory data are supported: 
soundtrack (natural audio scene), music, atomic sound 
effects (e.g., clap), speech, symbolic audio representations 
(MIDI, SNHC Audio), and mixing information (including 
effects). 

Coding requirements for descriptors and description 
schemes as proposed in the aforementioned MPEG-7 
Requirements Document are supported by suitable descrip- 
tors and their meta-descriptors. Description efficient 
representation — the efficient representation of data descrip- 
tions is supported. Description extraction — the use of 
Descriptors and Descriptions Schemes that are easily 
extract able from uncompressed and compressed data, 
according to several widely used formats is supported by the 
meta-descriptors. Robustness to information errors and 
loss — mechanisms that guarantee graceful behavior of the 
system in the case of transmission errors are supported. 

While text specific requirements for descriptors and 
description schemes are not proposed in the aforementioned 
MPEG-7 Requirements Document, suitable descriptors and 
their meta-descriptors support the ability of multimedia 
content to include or refer to text in addition to audio-visual 
information, provided the text descriptions and the interface 
allow queries based on audio -visual descriptions to retrieve 
text data and vice versa, and that the descriptions of text for 
text-only documents and composite documents containing 
text should be the same. 

While in some situations in which meta-descr iptors are 
used, the search engine or filter agent (user, si Hk) rpay have 

tO know the e xact fl«"re PYlrartinn algorithm em ployed by 
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the meta-descripti Q p_generation process, the specific a lgo - 
nthm used^ojL Jfcaniic ^extraction during t he description 
geyrah'on process is otherwise not relevant to trie me la^ 
description generation process. Hence, the_met a- descrip tion 
process is abl e to acm mmnHatft tp^nojg^ical development s 
in _feature extraction and encourage competitive develop - 
meni- 

The description of the invention and its applications as set 
forth herein is illustrative and is not intended to limit the 
scope of the invention as set forth in the following claims. 
Variations and modifications of the embodiments disclosed 
herein are possible, and practical alternatives to and equiva- 
lents of the various elements of the embodiments are known 
to those of ordinary skill in the art. These and other varia- 
tions and modifications of the embodiments disclosed herein 
may be made without departing from the scope and spirit of 
the invention as set forth in the following claims. 

What is claimed is: 

1. A method of representing a plurality of multimedia 
information, comprising: 

acquiring descriptors for the multimedia information; 
generating clusters of the descriptors; 
generating at least one meta-descriptors for the clusters; 
and 

attaching the at least one meta-descriptor to the multime- 
dia information, including respectively attaching the 
meta-descriptors for the clusters to items of the multi- 
media information described by the descriptors in the 
clusters, 

wherein the meta-descriptor generating step comprises 
generating respective groups of data elements for each 
of the clusters indicating relevancy of the descriptors 
therein, 

wherein at least some of the descriptors are representa- 
tions of features of an item of multimedia information 
belonging to a category of multimedia content, the 
features comprising an ordered set of features including 
color, texture, shape and sketch, and the category of 
multimedia content being still image; and 

the meta-descriptor generating step comprises generating 
respective binary vectors for each of the clusters indi- 
cating relevancy of the descriptors therein. 

2. A method of representing a plurality of multimedia 
information, comprising: 

acquiring descriptors for the multimedia information; 
generating clusters of the descriptors; 
generating at least one meta-descriptors for the clusters; 
and 

attaching the at least one meta-descriptor to the multime- 
dia information, including respectively attaching the 
meta-descriptors for the clusters to items of the multi- 
media information described by the descriptors in the 
clusters, 

wherein the meta-descriptor generating step comprises 
generating respective groups of data elements for each 
of the clusters indicating relevancy of the descriptors 
therein, and 

wherein at least some of the descriptors are representa- 
tions of features of an item of multimedia information 
belonging to a category of multimedia content, the 
features comprising an ordered set of features including 
color, texture, shape and sketch, and the category of 
multimedia content being still image; and 

the meta-descriptor generating step comprises generating 
respective groups of weight values for each of the 
clusters indicating respective weights for the descrip- 
tors therein. 
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3. A method of representing a plurality of multimedia 
information, comprising: 

acquiring descriptors for the multimedia information; 
generating clusters of the descriptors; 
generating at least one meta-descriptors for the clusters; 
and 

attaching the at least one meta-descriptor to the multime- 
dia information, including respectively attaching the 
meta-descriptors for the clusters to items of the multi- 10 
media information described by the descriptors in the 
clusters, 

wherein the meta-descriptor generating step comprises 
generating respective groups of data elements for each 
of the clusters indicating relevancy of the descriptors 15 
therein, and wherein: 

at least some of the descriptors are representations of 
features of an item of multimedia information 
belonging to a category of multimedia content, the 
features comprising an ordered set of features includ- 20 
ing color, texture, shape and sketch, and the category 
of multimedia content being still image; and 

the meta-descriptor generating step comprises generat- 
ing respective character strings for each of the clus- 
ters identifying at least one relevant feature having a 25 
predetermined set of representation types, and at 
least one of the representation types from the pre- 
determined set of representation types. 

4. A method of representing a plurality of multimedia 
information, comprising: 30 

acquiring descriptors for the multimedia information; 

generating at least one meta-descriptors for the descrip- 
tors and a group of data elements indicating relevancy 
of the descriptors therein; and 

attaching the at least one meta-descriptor to the multime- 
dia information wherein 

at least some of the descriptors are representations of 
features of an item of multimedia information 
belonging to a category of multimedia content, the ^ 
features comprising an ordered set of features includ- 
ing color, texture, shape and sketch, and the category 
of multimedia content being still image; and 
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the meta-descriptor generating step comprises generat- 
ing binary vectors indicating relevancy of the 
descriptors therein. 

5. A method of representing a plurality of multimedia 
information, comprising: 

acquiring descriptors for the multimedia information; 

generating at least one meta-descriptors for the descrip- 
tors and a group of data elements indicating relevancy 
of the descriptors therein; and 

attaching the at least one meta-descriptor to the multime- 
dia information wherein 

at least some of the descriptors are representations of 
features of an item of multimedia information 
belonging to a category of multimedia content, the 
features comprising an ordered set of features includ- 
ing color, texture, shape and sketch, and the category 
of multimedia content being still image; and 

the meta-descriptor generating step comprises generat- 
ing a group of weight values indicating weights for 
the descriptors therein. 

6. A method of representing a plurality of multimedia 
information, comprising: 

acquiring descriptors for the multimedia information; 

generating at least one meta-descriptors for the descrip- 
tors and a group of data elements indicating relevancy 
of the descriptors therein; and 

attaching the at least one meta-descriptor to the multime- 
dia information wherein 

at least some of the descriptors are representations of 
features of an item of multimedia information 
belonging to a category of multimedia content, the 
features comprising an ordered set of features includ- 
ing color, texture, shape and sketch, and the category 
of multimedia content being still image; and 

the meta-descriptor generating step comprises generat- 
ing a character string identifying at least one relevant 
feature having a predetermined set of representation 
types, and at least one of the representation types 
from the predetermined set of representation types. 

***** 



07/16/2003, EAST Version: 1.03.0002 



